Cache snapshotting performance improvements #7830

jwilder · 2017-01-12T16:02:09Z

Required for all non-trivial PRs

Rebased/mergable
Tests pass
CHANGELOG.md updated

This PR has has a few changes to improve the cache snapshotting throughput. Under heavy write load, snapshots can start to backup. This can cause 500 to be returned with a cache-max-memory-size error as well as more frequent write timeouts due to the cache being locked for a longer periods of time.

The images below are examples of 1.1 cache and heap memory size under very heavy write load (~2M/s). They were run a c4.8xlarge (36 core/ 64 GB RAM) with both 100k series and 1M series.

The saw tooth pattern below is the snapshots getting progressively larger and taking longer to complete. At the peaks, write errors for cache-max-memory-size being exceeded could be returned depending on the configuration of the limit. When they get larger, timeouts also start to occur when compactions kick in and use up a CPU.

Heap also continues to grow in this case:

With this PR, the same write load now looks like:

A longer 10B point run across 1M series looks similar whereas 1.1 would start hitting 500s fairly quickly.

In both cases, the cache size stayed well under the default 1GB max memory size.

The actual changes in this PR are the following:

Cache.Snapshot was simplified to just swap the hot cache with the snapshot cache instead of iterating of very key and copy values. This reduces lock contention with the cache is write locked.
CacheKeyIterator encodes keys in parallel. Previously, the writing of a snapshot was single thread and would iterate over each key in order and encode all the values. This now encodes all the keys concurrently while the writer goroutine writes them out to disk in order as soon as the encoded blocks are ready. The makes snapshotting leverage all available cores now.
Fix cache size getting incremented to early. If the cache limit was exceeded, the cache size counter would continue to grow with every failed write. When the snapshot completed, it would sometimes go negative.
Fix an issue where compaction scheduling would sometime stop. I'm not exactly sure how this happened locally, but the cache compactions for a shard just got stuck on a Timer.C channel that never fired. This code was reusing a Timer incorrectly according to the docs so it has been switch to a Ticker instead.

With these changes, the chances of getting a cache-max-memory-size limit error decrease significantly. Timeouts can still occur depending on the system load so increasing write-timeout may still be needed in some cases. The timeout should occur much less frequently though.

This simplifies the cache.Snapshot func to swap the hot cache to the snapshot cache instead of copy and appending entries. This reduces the amount of time the cache is write locked which should reduce cache contention for the read only code paths.

The CacheKeyIterator (used for snapshot compactions), iterated over each key and serially encoded the values for that key as the TSM file is written. With many series, this can be slow and will only use 1 CPU core even if more are available. This changes it so that the key space is split amongst a number of goroutines that start encoding all keys in parallel to improve throughput.

The memory stats as well as the size of the cache were not accurate. There was also a problem where the cache size would be increased optimisitically, but if the cache size limit was hit, it would not be decreased. This would cause the cache size to grow without bounds with every failed write.

I ran into an issue where the cache snapshotting seemed to stop completely causing the cache to fill up and never recover. I believe this is due to the the Timer being reused incorrectly. Instead, use a Ticker that will fire more regularly and not require the resetting logic (which was wrong).

mark-rushakoff · 2017-01-12T16:16:33Z

tsdb/engine/tsm1/compact.go

+}
+
+func (c *cacheKeyIterator) encode() {
+	concurrency := runtime.NumCPU()


Shouldn't this be runtime.GOMAXPROCS(0) instead of runtime.NumCPU, to respect the GOMAXPROCS environment setting?

As a side note, there's a few other places in the tsm1 package where we're using NumCPU. I think those ought to all be GOMAXPROCS.

I used NumCPU because it sounds like GOMAXPROCS might go away at some point: https://golang.org/pkg/runtime/#GOMAXPROCS. I also assumed that NumCPU is the same as GOMAXPROCS(0). That might be incorrect though.

Looks like NumCPU is different than GOMAXPROCS(0). Will update to use GOMAXPROCS(0).

If and when it does go away, gofix will be updated to minimize the effort to update our code.

The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously.

There may be situations where someone would want to set it lower, possibly in conjunction with taskset to deliberately limit how many CPUs influxdb can use. In other situations, we as developers may want to increase GOMAXPROCS to shake out contention issues.

mark-rushakoff · 2017-01-12T16:25:03Z

tsdb/engine/tsm1/compact.go

+			wg.Done()
+		}()
+	}
+	wg.Wait()


Because this method is only called in its own goroutine, the WaitGroup seems unnecessary.

Good point. Will update.

e-dard · 2017-01-12T16:36:20Z

@jwilder I've added 73ed864

jwilder added 4 commits January 11, 2017 11:12

Simplify Cache.Snapshot

ae838ef

This simplifies the cache.Snapshot func to swap the hot cache to the snapshot cache instead of copy and appending entries. This reduces the amount of time the cache is write locked which should reduce cache contention for the read only code paths.

jwilder added this to the 1.2.0 milestone Jan 12, 2017

jwilder requested a review from e-dard January 12, 2017 16:02

jwilder and others added 2 commits January 12, 2017 09:02

Update changelog

33be1e1

Add cache tests

73ed864

mark-rushakoff reviewed Jan 12, 2017

View reviewed changes

Simplifications and cleanup

06a8fd6

e-dard approved these changes Jan 12, 2017

View reviewed changes

Fix 32bit alignment

11f2645

jwilder merged commit 7e7f0d0 into master Jan 12, 2017

jwilder deleted the jw-cache-snapshot branch January 12, 2017 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache snapshotting performance improvements #7830

Cache snapshotting performance improvements #7830

jwilder commented Jan 12, 2017 •

edited

Loading

mark-rushakoff Jan 12, 2017

jwilder Jan 12, 2017

jwilder Jan 12, 2017

mark-rushakoff Jan 12, 2017

mark-rushakoff Jan 12, 2017

jwilder Jan 12, 2017

e-dard commented Jan 12, 2017

Cache snapshotting performance improvements #7830

Cache snapshotting performance improvements #7830

Conversation

jwilder commented Jan 12, 2017 • edited Loading

Required for all non-trivial PRs

mark-rushakoff Jan 12, 2017

Choose a reason for hiding this comment

jwilder Jan 12, 2017

Choose a reason for hiding this comment

jwilder Jan 12, 2017

Choose a reason for hiding this comment

mark-rushakoff Jan 12, 2017

Choose a reason for hiding this comment

mark-rushakoff Jan 12, 2017

Choose a reason for hiding this comment

jwilder Jan 12, 2017

Choose a reason for hiding this comment

e-dard commented Jan 12, 2017

jwilder commented Jan 12, 2017 •

edited

Loading